Diet and the evolution of ADH7 across seven orders of mammals

Dietary variation within and across species drives the eco-evolutionary responsiveness of genes necessary to metabolize nutrients and other components. Recent evidence from humans and other mammals suggests that sugar-rich diets of floral nectar and ripe fruit have favoured mutations in, and functional preservation of, the ADH7 gene, which encodes the ADH class 4 enzyme responsible for metabolizing ethanol. Here we interrogate a large, comparative dataset of ADH7 gene sequence variation, including that underlying the amino acid residue located at the key site (294) that regulates the affinity of ADH7 for ethanol. Our analyses span 171 mammal species, including 59 newly sequenced. We report extensive variation, especially among frugivorous and nectarivorous bats, with potential for functional impact. We also report widespread variation in the retention and probable pseudogenization of ADH7. However, we find little statistical evidence of an overarching impact of dietary behaviour on putative ADH7 function or presence of derived alleles at site 294 across mammals, which suggests that the evolution of ADH7 is shaped by complex factors. Our study reports extensive new diversity in a gene of longstanding ecological interest, offers new sources of variation to be explored in functional assays in future study, and advances our understanding of the processes of molecular evolution.

This lowers the K m -i.e. increases the enzyme affinity-for small alcohols like ethanol, while simultaneously disfavouring larger alcohols such as geraniol. Secondly, we predict that the ADH7 genes of highly insectivorous, carnivorous or herbivorous species, which are less likely to be exposed to dietary ethanol, are pseudogenized through frameshift or mutations leading to premature stop codons due to relaxed selection. To test these predictions, we examined ADH7 gene sequences of multiple groups of mammalian species including treeshrews, rodents, leaf-nosed bats and marsupials to study variation at site 294 and potential pseudogenes. Furthermore, we ran phylogenetic generalized linear models to test for statistically significant effects of frugivorous and/or nectarivorous diets on (i) variation at site 294 and/or (ii) putative ADH7 gene functionality versus pseudogenization. Our research expands on data generated by recent study [15] by including ADH7 sequences for an additional 89 ecologically relevant species, of which 59 are newly sequenced in this study, allowing us to better test adaptive hypotheses while controlling for phylogenetic effects.

Sample collection and DNA extraction
We generated new data from DNA derived from blood and tissue samples of 59 mammalian species: 27 bat, nine treeshrew, 11 rodent, four opossum and eight additional small to mid-sized mammal species in Central and South America, the Caribbean, as well as Southeast Asia. We extracted genomic DNA from all samples using the DNeasy Blood and Tissue Kit (Qiagen following the manufacturer's instructions. A list of sampled species, country of origin and DNA sequencing approach used is provided in table 1. Further details on sample collection, exportation and importation and permits are provided in the Ethics statement.  royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 10: 230451 Table 1. The species included in this study and the country of origin for samples along with the sequencing approach used for the ADH7 gene, i.e. target capture plus massive parallel sequencing, or Sanger sequencing following PCR amplification of exon 7. sequencing approach. While the target capture approach allowed for the efficient investigation of most of the ADH7 gene for a large number of species, Illumina sequencing has relatively high error rates and the effectiveness of the baits is not guaranteed for species without a close genomic reference available [31]. We therefore used Sanger sequencing for a subset of samples for which sufficient quantities of DNA were available, to validate the sequences obtained via massive parallel sequencing. For the target capture approach, we used a custom set of biotinylated RNA probes (myBaits, Arbor Biosciences, Ann Arbor, MI) designed to capture the ADH7 coding region in a wide range of mammals. Complementary RNA baits were designed using exons of ADH7 coding sequences (mRNA) from species closely related to those of interest, which were retrieved from annotated genomes from the National Center for Biotechnology Information (NCBI) (electronic supplementary material, table S1). The RNA baits design used 3X tiling to give 35 938 baits following standard filtering criteria. Sequencing libraries were prepared by the University Core DNA (UCDNA) services at the University of Calgary with the NEBNext Ultra II FS Library Prep Kit (New England BioLabs). Following library preparation, we used complementary biotinylated RNA baits to capture ADH7 gene sequences for species of interest, following the manufacturer's protocol. Hybridized library sequences were denatured, amplified and massive parallel sequenced on an Illumina NextSeq 500 using a 2 × 150 paired end NextSeq 500/550 mid-output v. 2.5 (300 Cycles) kit by the UCDNA services at the University of Calgary. For the second approach, we first amplified exon 7 of ADH7 using custom-designed primers (electronic supplementary material, tables S2 and S3). We then purified and Sanger sequenced the product at the University of Calgary Centre for Health Genomics and Informatics using the same primers on an Applied Biosystems 3730xl (96 capillary) Genetic Analyzer using BigDye Terminator chemistry. We find high concordance in the sequences generated by these two methods. For example, of the six treeshrew species for which we have both Sanger and Illumina sequences, exon 7 is 100% identical for four species and 99.24% identical for the remaining two species (one base difference in each case). In one species (U. everetti) a base was called as a C by Sanger sequencing and a T in the consensus sequence of the Illumina short-reads. Examining the short-reads shows support for both T/C bases at roughly 50% each, indicating a polymorphism in which different methods each picked up a different base. In the other case (T. tana) there is no evidence of a polymorphism, but the sequences are derived from two different individuals, so the one base difference could be due to individual variation. In both cases the variable bases are in the third position of a codon, so do not change the coding sequence.

Assembly and alignment
We removed adapters and trimmed Illumina short-reads using BBDuk from the BBTools suite [32]. We removed bases lower than q = 10 from the left and right ends of reads and only retained reads that were at least 25 bases long after trimming. We used the pipeline HybPiper (https://github.com/ mossmatters/HybPiper/wiki) to extract and assemble gene sequences from the targeted enrichment sequencing reads [33]. Within HybPiper, sequence reads are mapped to reference gene sequences (such as those used to design the baits) with BWA sorted based on similarity, and then assembled into contigs on a gene-by-gene basis with SPAdes [34]. Exonerate is then used to align contigs against the reference gene sequences and extract coding regions. During analyses, we discovered the annotation of multiple ADH7 genes in newer chromosome-level genome assemblies now available for some bat species [35]. The presence of a second ADH7 gene in bats was not known at the time we designed the baits for our target capture sequencing, complicating our assembly pipeline. As expected with short-read sequencing data, attempts to fully assemble these paralogues separately from our reads were not successful; however, variation in bat ADH7 sequences at site 294 could be determined by visually inspecting the mapped shorts reads in IGV [36].
To assemble reads generated via Sanger sequencing, forward and reverse reads were trimmed to remove bases with poor base call quality, and contigs were created with the software platform Geneious Prime (2021.1.1; https://www.geneious.com). Chromatograms were inspected for any polymorphic bases at site 294. We aligned all sequences newly generated for this study via Sanger and Illumina Sequencing, along with 85 ADH7 sequences curated in a previous study [15] and 30 sequences newly mined from NCBI (electronic supplementary material, table S4) using MAFFT on the software platform Geneious Prime (2021.1.1) and cleaned manually. Stop codons found in sequences assembled via HybPiper were manually confirmed by inspecting the mapped short-reads with IGV [36] and only those confirmed by visual inspection were considered correct.
We retrieved some portion of ADH7 for all 59 species newly sequenced. Of these, we were able to retrieve sequence data at the site of interest (294) for 55 species, excluding the hog-nosed skunk (Conepatus semistriatus), kinkajou (Potus flavus), moonrat (Echinosorex gymnura) and Himalayan water shrew (Chimarrogale himalayica). Through Sanger sequencing, we retrieved sequence data for exon 7 of ADH7 for 19 out of 21 species, being unable to retrieve sequences with high base call quality for Gervais's fruit-eating bat (Artibeus cinereus) and the greater round-eared bat (Tonatia bidens).

Using phylogenetic generalized linear models to test for correlations with diet
To test if a frugivorous and/or nectarivorous diet correlates with (i) retention of a functional ADH7 gene and/or (ii) substitutions to amino acids other than alanine at site 294 of the ADH class IV enzyme (encoded within exon 7 of ADH7) we ran phylogenetic generalized linear models (PGLMs). PGLMs are a modification of generalized linear models that consider the expected covariance structure of residuals, i.e. due to relatedness among species, to generate modified slope and intercept estimates that can account for autocorrelation due to phylogeny [37]. For each model we used the dietary proportions (specifically from per cent fruit and/or nectar in the diet) retrieved from EltonTraits 1.0 [38] as the predictor variable. We additionally ran the models with fruit and/or nectar in the diet as a binary predictor, and our results do not differ. (For sake of brevity, we report only the results royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 10: 230451 generated by the former practice based on the per cent fruit/nectar in the diet.) To test our first hypothesis, we used a binary classification of the ADH7 gene status (retention/pseudogenization; n = 166) as the dependent variable, and to test our second hypothesis we used the presence of a substitution at site 294 (yes/no) as a binary dependent variable (n = 133). For species with more than one ADH7 gene, the species was included as having a substitution if at least one of the ADH7 genes had an amino acid other than alanine at site 294. Because the majority of variation at site 294 was found within bats, we repeated the substitution analysis for bats only. We ran each PGLM using both the maximum penalized likelihood estimation (MPLE) and the IG10 methods, which differ slightly in how the calculated likelihood is penalized, following established protocols [15,39] using the function phyloglm within the R package phylolm [40]. Phylogenetic relationships among study species were based on TimeTree [41]. Missing data ('unknown', figure 2) were dropped prior to running the models.

Diet and putative functional variation at site 294
We predicted that frugivorous and nectarivorous mammals would have substitutions at site 294 from the ancestral amino acid to a residue with high predicted affinity for ethanol. Alanine is the most likely candidate as the ancestral condition for mammals as it is present at site 294 in most species, including  Figure 2. Evolutionary history of ADH7 in mammals included in this study. Per cent fruit and/or nectar included in a species' diet is indicated in the first row of tip annotations. The second row indicates whether any substitutions away from the ancestral alanine are found at site 294 of ADH7 and the third row indicates whether the gene is putatively functional or pseudogenized. Red boxes with the letter psi (ψ) indicate the inferred lineage in which gene loss events occurred, based on putatively shared loss-of-function mutations. Phylogeny via TimeTree ( [41]; http://timetree.org), plotted with the R package ggtree [42].
Marsupials have three paralogues of ADH7, as previously described [15]. All of the American marsupials (order Didelphimorphia) we investigated appear to follow the same pattern of having an alanine, aspartic acid and leucine in the three paralogues, respectively. (Coverage of the third gene in the brown-eared woolly opossum (Caluromys lanatus) was insufficient at site 294, but the closely related Derby's woolly opossum (Caluromys derbianus did follow this pattern). The koala (Phascolarctos cinereus) had a valine and isoleucine, but we could not conclusively identify the orthologous site in the third gene. In the Tasmanian devil (Sarcophilus harrisii) the pattern was alanine, alanine and asparagine, but we identified a premature stop codon in the second paralogue. As previously described [16], African great apes (Pan paniscus, Pan troglodytes, Homo sapiens and Gorilla gorilla) and the aye-aye (Daubentonia madagascariensis) have a valine at site 294. The pen-tailed treeshrew (Ptilocercus lowii), all other treeshrews and almost all remaining placental mammals we sequenced or mined from publicly available data possess an alanine at site 294 (figure 2). The only exception was the little pocket mouse (Perognathus longimembris), which has serine at the critical site. This was the only species we found to possess this residue.
We tested whether diet was a significant predictor of a substitution at site 294 of the ADH class IV enzyme(s) coded by ADH7 using a phylogenetic generalized linear model with the IG10 and MPLE logistic regressions [43]. We found no evidence that the percentage of fruit and/or nectar in the diet predicts substitution frequencies at site 294 across mammals overall (IG10: slope = −0.079, p-value = 0.848; MPLE: slope = 0.039, p-value = 0.926; figure 3a), controlling for the impact of phylogenetic relationships among species.
When limiting the analysis to bats, the same patterns were found, although the slope of the relationship is suggestive (figure 3b) and raises the possibility of an underpowered result that should be treated with caution (IG10: slope = 1.527, p-value = 0.087; MPLE: slope = 1.296, p-value = 0.141).

Diet and putative gene functionality
We predicted that the ADH7 genes of species with low levels of dietary ethanol exposure, i.e. highly insectivorous, carnivorous or herbivorous species, would be pseudogenized through frameshift or mutations leading to premature stop codons in the exons of ADH7 due to relaxed selection. We found  royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 10: 230451 evidence of potential premature stop codons in 32 species, including species which had been previously described [15]. This included inferred pseudogenization events in the ancestral lineages of extant bovids, cetaceans, elephantids, carnivores and independently evolved premature stop codons in the nine-banded armadillo (Dasypus novemcinctus), white rhinoceros (Ceratotherium simum), horse (Equus caballus), common degu (Octodon degus), guinea pig (Cavia porcellus) and North American beaver (Castor canadensis). We newly identified premature stop codons in the assemblies of the Spanish mole (Talpa occidentalis) and white-footed mouse (Peromyscus leucopus), and in the brown-throated (Dasypus variegatus) and Hoffman's two-toed sloth (Choloepus hoffmanni) sequenced for this project (figure 2, electronic supplementary material, table S5). Coverage and/or completeness of the assembled gene was too low to determine the status of ADH7 with confidence in four species, including the aardvark (Orycteropus afer), Himalayan water shrew (Chimarrogale himalayaica), moonrat (Echinosorex gymnura) and white-nosed coati (Nasua narica). The sequence assembled from the raccoon (Procyon lotor) sample was most similar to Tupaia and primates, casting doubt on the identity of the sample, and it was removed from downstream analyses. We did not find a significant correlation between percentage of fruit and/or nectar in the diet and retention of a functional ADH7 gene using the MPLE method (slope = −0.001, p-value = 0.998; figure 3c), nor the IG10 method (slope = −0.966, p-value = 0.062). As with the tests of site 294 variation in bats, the slope we generated for the IG10 model was not flat, so these tests might suffer from low power to detect an effect.

Discussion
Our aim was to evaluate the impact of diet on the evolution of a key gene in the ethanol metabolic pathway, ADH7, and to test the hypothesis that frugivorous and nectarivorous mammalian species have convergently evolved amino acid substitutions at functional site 294 that may enhance ethanol metabolic activity. Our findings are threefold: (i) most mammals possess alanine, the purported ancestral residue, at site 294 of the ADH class IV enzyme encoded by ADH7 and we report many new cases of substitutions at this site, (ii) frugivorous bats are inferred to have undergone an ADH7 gene duplication, and frugivorous, but not insectivorous species, had extensive variation at site 294 raising the possibility of adaptive variation, and (iii) we did not find a consistent, significant impact of the percentage of fruit/nectar in the diet and the presence of a substitution at site 294, or retention of a functional ADH7 gene, although our confidence for rejecting this hypothesis is low in bats. We discuss these results in further detail, below.

Diet and putative functional variation at site 294
Within the ADH class IV enzyme encoded by gene ADH7, site 294 is well documented for its role in substrate binding, and plays a key role in predictive ethanol metabolic activity. Our first prediction was that species that are primarily frugivorous and/or nectarivorous would possess an amino acid at site 294 of ADH7 that was likely to have high binding efficiency to ethanol, such as valine or another amino acid with similar properties, i.e. a relatively large side chain and hydrophobic nature [19]. This prediction was inspired by the observation that a substitution from alanine to valine in human ADH7 leads to a 40-fold increase in ethanol metabolism [16].
We found that the majority of mammals in our study possess alanine at site 294, and this amino acid was reconstructed as the ancestral state. Interestingly, we also found a wide range of variation at this site, with evidence of phylogenetic clustering (figure 2). Frugivorous and nectarivorous bats possessed remarkable variation at site 294 of ADH7 and annotations of newer chromosome-level long-read genome assemblies of species in this order show two paralogues of ADH7. This putative duplication is found in species across the chiropteran phylogeny, in both Yinpterochiroptera and Yangochiroptera, suggesting that it evolved in the ancestor of extant bats. The presence of a second ADH7 gene in bats is a new discovery and was not known at the time we designed the baits for our target capture sequencing. Furthermore, short-read sequencing reads are often inadequate for the assembly of closely related and/or duplicated genes. This complicated our analysis pipeline and prevented full assembly of the ADH7 genes from the targeted sequencing in bats. However, we were able to identify amino acid variation at site 294 from visual inspection of the mapped short-reads.
While the insectivorous bats nearly uniformly retained alanine (with one exception, discussed below), many frugivorous and nectarivorous bats had amino acid substitutions in one or both ADH7 paralogues.
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 10: 230451 We detected a minimum of nine independent transitions from alanine to other amino acids, and frugivorous/nectarivorous bats possessed four novel residues at site 294: valine, isoleucine, threonine, and glycine. In accordance with our predictions of convergence on valine, which increases ethanol metabolism 40-fold in primate ADH7, Glossophaga soricina, a nectarivorous phyllostomid species, and Pteropus vampyrus, a frugivore of the family Pteropodidae, each independently evolved valine at site 294. Intriguingly, Orbach et al. [44] found that the flying abilities of wild Glossophaga soricina are not affected after ingesting ethanol (1.5% ethanol concentration), suggesting that this species may be able to metabolize ethanol efficiently. The frugivorous Samoan flying fox (Pteropus samoensis) and Egyptian fruit bat (Rousettus aegyptiacus, also family Pteropodidae) both possessed an isoleucine substitution at site 294. Isoleucine, like valine, is a non-polar amino acid and possesses a larger side chain than alanine, suggesting that its presence at the critical site would increase the ethanol metabolic efficiency. It is possible the last common ancestor of the genera Pteropus and Rousettus possessed the substitution to isoleucine, and that an additional shift later occurred in P. vampyrus. Finally, the Antillean fruiteating bat (Brachyphylla cavernarum) has evolved a glycine in one paralogue, and three bat lineages have independently evolved a threonine at site 294 in one or both paralogues. Unlike the other amino acids discussed, it is unclear how threonine or glycine might impact the enzyme's ability to bind ethanol or interact with the NAD + cofactor that binds nearby [20]; functional reconstitution and expression studies might shed further light on this question.
Considering other mammals, we replicated the finding of a valine substitution in humans and African great apes, and independently in aye-ayes, which are discussed in detail elsewhere [16,22]. Aye-ayes are best described as omnivorous, but are known to have a close relationship with the ethanolic nectar of the ravenala tree ( [21]; Melin, Moritz, Guthrie, Johnson and Dominy 2017, unpublished data), suggesting that if diet selects for ADH7 substitutions, gross dietary classifications may be missing important nuances. As previously identified in other marsupials (koala, Tasmanian devil and gray short-tailed opossum), we found that all American opossum species in our sample had three ADH7 genes, which appear putatively functional: Derby's woolly opossum (Caluromys derbianus), gray short-tailed opossum (Monodelphis domesticus) and the gray four-eyed opossum (Philander opossum) all possessed alanine, aspartic acid and leucine at site 294 across the paralogues. Leucine is non-polar and has a similar-sized sidechain to isoleucine; its presence at the critical site probably leads to a relatively high binding affinity for ethanol. The functional variation of ADH7 in marsupials is unclear, and presents a promising avenue for future study.
Several of our results were not consistent with our first prediction. Two of three vampire bats, Diaemus youngi and Desmodus rotundus, possessed a valine at site 294, while the hairy-legged vampire bat (Diphylla ecaudata) retained an alanine. Given the dietary specialization on blood, it is possible that the ADH7 gene may be under relaxed selection in this lineage, or that other selective pressures are operating on it. For example, ADH genes have also been suggested to play roles in detoxification of other substances [45,46]. Finally, the retention of alanine in the pen-tailed treeshrew (Ptilocercus lowii), was surprising, as this species is renowned for consuming ethanol [12]. Given the behavioural and isotopic evidence of high metabolic activity for ethanol in treeshrews [12], it is possible that alternative molecular adaptations for metabolizing ethanol are in use, such as other alcohol dehydrogenases, aldehyde dehydrogenases, and enzymes in the microsomal ethanol oxidizing system (MEOS), catalase and non-oxidative pathways. Intriguingly, the glucuronidation (a non-oxidative) pathway has been hypothesized to contribute to ethanol metabolism in treeshrews [12,47] and might be worth future scrutiny to examine signs of adaptive evolution.

Diet and putative gene functionality
Our second prediction was that the ADH7 genes of highly insectivorous, carnivorous/sanguinivorous, or herbivorous species will be pseudogenized due to relaxed selection, as they are less likely to be exposed to dietary ethanol. Our prediction was partially supported. We identified one or more independently derived putative premature stop codons among many non-frugivorous species, including the herbivorous Hoffmann's two-toed sloth (Choloepus hoffmanni) and brown-throated sloth (Bradypus variegatus). These newly sequenced species add to the species previously identified by Janiak et al. [15] to have pseudogenes, including many herbivores (bovids, elephants, rodents), carnivores (order Carnivora, cetaceans) and the insectivorous nine-banded armadillo (Dasypus novemcinctus). However, the phylogenetic generalized linear model we ran on all available species suggests the percentage of fruit and/or nectar in the diet was not significantly negatively correlated with possessing a putative ADH7 pseudogene, which suggests the biological effect is probably weak and other factors may be at royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 10: 230451 play. Of the animals newly sequenced in our study with carnivorous evolutionary histories, we found the striped hog-nosed skunk (Conepatus semistriatus, omnivorous) and kinkajou (frugivorous) shared a stop codon also found in other carnivores, such as dogs, wolves and foxes. This suggests that shifts to a partially or majority frugivorous diet, following ancestry focused on meat eating, have not coincided with regain of ADH7 function for ethanol metabolism. It is possible that other genes have undergone neofunctionalization for ethanol metabolism, but we cannot test that hypothesis with our current dataset.

Conclusion and future directions
We report extensive new amino acid variation at kinematically important sites in the ADH7 gene of bats and marsupials of the Americas, and confirm previous reports of variation in primates. Furthermore, we found evidence of potential premature stop codons, primarily in species for which fruit and/or nectar consumption is low, indicating potential relaxation of selection when ethanol is not common in the diet. These results raise the intriguing possibility that some frugivorous/nectarivorous species are able to metabolize ethanol more efficiently than most mammals. While we did not find evidence that diet was a significant predictor of substitutions at site 294 or ADH7 gene functionality, our statistical tests might be underpowered. Future studies of positive selection (e.g. d N /d S ) examining the complete coding sequence of the entire gene may be fruitful. Furthermore, our measures of the importance of ethanol in the diet is crude, and ethanol could be more important in the diet of some species than others, including those classified as omnivores, where overall fruit and nectar in the diet might be moderate. Finally, data on enzyme kinetics that test the efficiency of ADH class IV enzymes to evaluate the impacts of isoleucine, leucine, valine, threonine, serine and glycine at the critical site 294, in the context of the species-specific amino acid composition (e.g. [16]), would provide functional insight. Detailed study of genes involved in other ethanol metabolic pathways and on the ethanol content of wild foods consumed by the species reported here (e.g. [7]), as well as more nuanced examination of omnivorous species which can consume a great deal of nectar or fruit [30], would be instructive and shed further light on this intriguing topic.